Improving Biomedical Text Categorisation with NLP

نویسنده

  • Michael Matthews
چکیده

Background: Text categorisation has been used in bioinformatics to help identify documents containing protein-protein interactions. Standard text categorisation methods have used the bag-of-words approach with little input from NLP. While this has proved effective in the past, there is some evidence that the techniques are not adequate in some biological domains. Here we examine how chunking, named-entity recognition and relationship extraction can be combined with traditional text categorisation techniques to improve the classification of documents containing protein-protein interactions. Conclusions: A system that combines the output of an NLP system with the standard techniques of text categorisation can produce results that exceed the performance of either system on its own. The F1 of a system that combined features of an NLP system with standard text categorisation features was 68.1 compared with 62.0 using text categorisation alone and 61.9 using relationship extraction alone. 1 Background 1.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Deep Belief Networks and Biomedical Text Categorisation

We evaluate the use of Deep Belief Networks as classifiers in a text categorisation task (assigning category labels to documents) in the biomedical domain. Our preliminary results indicate that compared to Support Vector Machines, Deep Belief Networks are superior when a large set of training examples is available, showing an F-score increase of up to 5%. In addition, the training times for DBN...

متن کامل

A New Method for Improving Computational Cost of Open Information Extraction Systems Using Log-Linear Model

Information extraction (IE) is a process of automatically providing a structured representation from an unstructured or semi-structured text. It is a long-standing challenge in natural language processing (NLP) which has been intensified by the increased volume of information and heterogeneity, and non-structured form of it. One of the core information extraction tasks is relation extraction wh...

متن کامل

Anaphora Resolution: To What Extent Does It Help NLP Applications?

Papers discussing anaphora resolution algorithms or systems usually focus on the intrinsic evaluation of the algorithm/system and not on the issue of extrinsic evaluation. In the context of anaphora resolution, extrinsic evaluation concerns the impact of an anaphora resolution module on a larger NLP system of which it is part. In this paper we explore the extent to which the well-known anaphora...

متن کامل

NLP-NG - A New NLP System for Biomedical Text Analysis

NLP-NG is a new NLP system consisting of three components: NG-CORE (language processing), NG-DB (database management), and NG-SEE (interactive visualization and entry). The ultimate goal of NLP-NG is to produce information retrieval systems in which users can choose full-text schema, adding specific items to focus their queries. Schema are created by a normalization process which elides adjunct...

متن کامل

UMLS content views appropriate for NLP processing of the biomedical literature vs. clinical text

Identification of medical terms in free text is a first step in such Natural Language Processing (NLP) tasks as automatic indexing of biomedical literature and extraction of patients' problem lists from the text of clinical notes. Many tools developed to perform these tasks use biomedical knowledge encoded in the Unified Medical Language System (UMLS) Metathesaurus. We continue our exploration ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006